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In this paper, we review the main problem concerning the calculation of X-ray scattering of simulated model 
systems, namely their finite size. A novel method based on the Rayleigh-Debye-Gans approximation was 
derived, which allows sidestepping this issue by complementing the missing surroundings of each particle with 
an average image of the system. The method was designed to operate directly on particle configurations without 
an intermediate step (e.g., calculation of pair distribution functions): in this way, all information contained in the 
configurations was preserved. A comparison of the results against those of other known methods showed that 
the new method combined several favourable properties: an arbitrary g-scale, scattering curves free of truncation 
artifacts and good behaviour down to the theoretical lower limit of the g-scale. A test of computational efficiency 
was also performed to establish a relative scale between the speeds of all known methods: the reciprocal lattice 
approach, the brute force method, the Fourier transform approach and the newly presented complemented 
system approach. 
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I. INTRODUCTION 

The theoretical foundations of the theory of X-ray scatter- 
ing were laid down nearly a century ago in the form of the 
Rayleigh-Debye-Gans approximation. " Although more gen- 
eral theories were developed later, this approach still remains 
popular and is widely used to interpret and evaluate the results 
of various scattering experiments. The theory establishes a 
Fourier transform relation between the real space (the sam- 
ple) and the reciprocal space (the scattering pattern). Mov- 
ing between these two might therefore seem straightforward; 
however, in practice, going either way brings its own set of 
difficulties. This discussion focuses on the problem of calcu- 
lating the scattering pattern of a model system constructed by 
computer simulation. 

The main obstacle in such a calculation is the finiteness of 
the model system. The choice of the volume of the simulation 
cell is frequently rather limited — at the lower end, it is bounded 
by the maximum range of the correlations in the simulated 
system, while at the upper, the available CPU time is usually 
the limiting factor. At the first sight, one might be tempted 
to disregard the difference between the space equipped with 
periodic boundary conditions and the real (physical) space, and 
apply the Fourier transform to the raw particle configurations 
obtained from a simulation. However, such a naive approach 
leads to severe truncation artifacts which completely engulf 
the relevant part of the scattering pattern and render the results 
unusable. 

Various methods have been developed to overcome this prob- 
lem: the most direct one is the reciprocal lattice approach, in 
which an infinite system is constructed by stacking simula- 
tion cells into a pseudo-crystal. In consequence, the resulting 
scattering pattern consists of discrete scattering peaks whose 



positions depend only on the cell side length. While the results 
are otherwise favourable, such discreteness is clearly not a 
feature of the simulated system, but rather a consequence of 
the way the method deals with periodic boundary conditions. 

Another method (the so-called "brute force approach"), 
which we developed recently, applies the Debye equation to a 
set of polydisperse cubic cut-outs of the system and combines 
their scattering patterns in order to suppress the truncation 
artifacts. In contrast to the reciprocal lattice approach, this 
method yields scattering intensities at arbitrary angles and is 
thus more in accord with the true nature of the system. But 
while providing relatively satisfactory results, the numerical 
suppression of truncation artifacts is found to be less than 
optimal, leaving some remains visible in the form of small 
oscillations on the scattering curve, particularly at small angles. 
This is to be expected, since the method is based mainly on 
numerical grounds. 

In the third — and perhaps most commonly used — method, 
the structure factor for each pair of atom types is first deter- 
mined by Fourier-transforming the corresponding pair distribu- 
tion function. The total scattering is then calculated by multiply- 
ing these structure factors by appropriate particle form factors 
and summing up the results. Application of this method to 
the results of a computer simulation requires determination 
of the various pair distribution functions from the simulation 
data, preferably with good resolution and sufficiently small 
statistical uncertainty. This implies some sort of binning which 
inherently discards a part of the available information. 

In the following discussion, we propose a method that ap- 
plies the Debye equation directly to configuration snapshots 
in order to preserve all the information contained within them, 
while also compensating for the finite size of the simulation 
cell by complementing the system in a physically sensible way. 
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II. THEORY 

The differential scattering cross-section per unit volume of 
a sample containing discrete particles is given by the Debye 
equation 7 
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where q represents the scattering vector, V is the volume of the 
sample, N is the number of particles in the system, indices j 
and k denote the j-th and the k-th particle, bj(q) and b/ ( (q) are 
their scattering lengths and rj and their position vectors. The 
angle brackets denote the canonical average. Let us assume 
now that all the particles are spherical and that there are T types 
of them; their corresponding numbers shall be designated by 
N\,N2,... ,Nt- The right-hand side of equation (1) can thus be 
rewritten as follows 
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Integration is performed over the sample volume, the first two 
sums go over the atom types; 8(r) is the Dirac delta function, 
subscript A:j means the j-th atom of type A and subscript 
B:k means the k-th atom of type B. For spherical particles, 
scattering lengths depend only on the atom type — since they 
are independent of the particles' orientation — and can be taken 
out of the integrals. Some further rearrangement results in 
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The quantity in angle brackets can be transformed into 
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where pi (r') and pQ (r',r") are the one- and two-particle 
number densities and 8a,b is the Kronecker delta. Let us impose 
a further restriction, namely that the system be homogeneous. 
For such a case, one has 
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i.e., the one-particle density of type A is equal to the aver- 
age density of particles of type A, Pa, and the two-particle 
density of types A and B depends only on the difference be- 
tween position vectors r' and r" (denoted below by r). One 



of the integrations in equation (3) can be therefore performed 
explicitly: 
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By introducing the pair correlation functions g^g(r) 
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equations (3) and (7) can be combined into 
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Frequently, the systems considered in X-ray scattering exper- 
iments are not only homogeneous, but also isotropic — meaning 
that both the functions in the above equation and the result itself 
depend only on the length of the scattering vector q = \q\. This 
allows one to perform rotational averaging on equations ( 1 ) 
and (9), yielding 
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where is the distance between the j-th and ^-th atom and 
r = |r|. The right-hand side of this equation shows that one 
can calculate the scattering of a system if all pair distribution 
functions gAB( r ) are known. This approach (called the Fourier 
transform approach in this discussion) is especially viable if 
analytical solutions to gAB{f) can be found. A slightly modified 
version of the above equation is ordinarily used: 
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The term (2n) 3 8(q) represents forward scattering, i.e., radia- 
tion scattered in the direction of the primary beam. Since this 
contribution cannot be measured, it is of little practical value 
and is thus frequently omitted from the result. 
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The Fourier transform approach can also be used with gAB(r) 
obtained from simulation data, provided that the resolution of 
these functions in the r-domain is high enough. 

Equation (10) reveals very clearly the origin of the unwanted 
truncation effects that arise when calculating the scattering 
from simulation data via the Debye equation; failing to take 
into account the terms with Rj k > r c (where r c is the cut-off 
distance, usually half of the simulation cell's side length) is 
equivalent to setting all pair distribution functions gAn( r ) to 
zero for r>r c . From the viewpoint of a particle in the centre of 
the cell, the density at large distances does not tend towards its 
average value within the system, but instead drops off sharply 
to zero when crossing the boundary of the cell. Unfortunately, 
a computer simulation does not provide any information about 
correlations over distances larger than r c ; however, a widely 
used criterion for selecting an appropriate simulation cell size 
is that over a distance r c — L/2 (L being the cell's side length), 
all interparticle correlations should vanish. A best guess would 
therefore be that at r > r c , all pair correlation functions assume 
a constant value equal to their theoretical limiting value at large 
distances. A set of complemented pair distribution functions 
%g(r) is then constructed: 



The second term in equation (13) can be written as follows: 
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where gT£(r) are the pair distribution functions obtained from 
the simulation data. The limiting values of all pair distribution 
functions are 1 by definition (strictly speaking, lim r ^oo gAB(r) 
equals 1 when A ^ B and 1 — 1/Na when A = B, but since 
we are considering the limiting value of gAB{ r ) m the thermo- 
dynamic sense, i.e., in an infinitely large system, Na grows 
over all limits and the term 1 /Na vanishes). Inserting Yab{t) in 
place of gAs{ r ) m the right-hand side of equation (10), we get 
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Although the first of the above integrals cannot be evaluated 
in the Riemann sense, it can be regarded as a distribution and 
is, in that respect, equivalent to [2n) i 8{q). Inserting equations 
(14) and (15) into equation (13), we get 
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The first term is the familiar Debye relation, but with only inter- 
particle distances of less than r c taken into account: this term 
represents the scattering of the explicit part of the system (i.e., 
the simulated particles). This contribution is complemented by 
the second term which arises from light interference between 
the explicit part and the averaged surroundings. A real-world 
example of these two contributions is shown later in the text 
in figure 2. The term (2n) 3 8(q) again represents the forward 
scattering, which is usually of no interest and can be omitted 
altogether at q > 0. 
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H{r) is the Heaviside step function. Note that the first term of 
the above equation differs from the right-hand side of equa- 
tion (10) only in discarding distances larger than r c . One could 
therefore get the same result by discarding all Rj k > r c in the 
left-hand side of equation (10) 
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III. NUMERICAL TESTS 

In order to evaluate the characteristics of the complemented 
system approach compared to other aforementioned methods, 
a test run was conducted using the simulation data from our 
dr. (13) previous study on aldehydes. The TraPPE-UA (Transferable 
Potential for Phase Equilibria — United Atom) force field 
was used to model the aldehyde. In this model, CH3, CH2 and 
CH groups are treated as single sites: an aldehyde molecule 
therefore consists of a linear chain of united CH X atoms with 
an oxygen atom bonded to the terminal one. 

The simulation run comprised 300 pentanal molecules at 
25 °C in a simulation box with a side length of 37.7025 A. 
The configurational bias Monte Carlo technique was employed. 
First, the system was equilibrated for 10000 cycles (one cycle 
consisted of 300 Monte Carlo steps); after that, a production 
run of 40000 cycles followed. During this run, a snapshot 
of the configuration was saved each 400 steps; a total of 100 
independent configuration snapshots were obtained this way. 
X-ray scattering curves were calculated directly from the 
(14) simulation data by each respective method. In the Fourier trans- 
form method, the pair distribution functions were estimated 
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FIG. 1. Comparison of results: the scattering curve of pentanal calcu- 
lated via different methods: the reciprocal lattice approach (circles), 
the brute force method (dashed line) and the complemented system 
approach (solid line). 



by binning the interparticle distances. To evaluate the impact 
of the bin width on the results, four different bin widths were 
used: 0.5 A, 0.2 A, 0.1 A and 0.05 A. 

The computational efficiency of the methods was assessed 
by measuring the time needed to process a set of 100 configu- 
rations, each of them comprising N particles. A series of such 
sets was generated, with N varying between one hundred and 
several thousand. For the Fourier transform (FT) approach, the 
brute force (BF) method and the complemented system (CS) 
approach, a g-range of A -1 to 2.5 A -1 in steps of 0.05 A -1 
was used (totalling 50 points on the scattering curve), while for 
the reciprocal lattice (RL) approach, the Miller indices were 
limited to h,k,l < 10, leading to 85 points within a g-range of 
0.17 A -1 to 1.7 A -1 . Pair distribution functions for use in FT 
were collected with a bin width of 0. 1 A. The algorithms were 
implemented as follows: BF as a FORTRAN program, RL and 
CS as a C program and FT as a GNU Octave script. 



IV. RESULTS AND DISCUSSION 

The results of the reciprocal lattice approach, the brute force 
method and the complemented system approach are shown 
in figure 1 . It is evident that the result of the CS almost com- 
pletely matches the result of the RL with the only discernible 
difference arising at very low angles (q less than approximately 
0.3 A~ l ). It must be taken into account, though, that the simula- 
tion reproduces only correlations over distances below one-half 
of the simulation cell's side length. (In fact, distances between 
L/2 and Ly/3/2, L being the simulation cell's side length, are 
also reproduced, but with lesser accuracy. Nevertheless, one 
must keep in mind that the CS method, at least in the present 
implementation, explicitly ignores distances longer than L/2.) 
Any scattering curve calculated from simulation data is there- 
fore valid only down to q^ = 4-n/L, which for the curves 
shown equals 0.33 A -1 . The observed behaviour of the scat- 



FIG. 2. The two contributions of equation (16). The first term produces 
the upper solid curve and the second term produces the lower solid 
curve; the latter is plotted with a negative sign in order to facilitate 
the comparison. Subtracting the lower curve from the upper yields the 
scattering curve corrected for the finite-size effects; it is shown here 
in a dashed line. 



tering curves in figure 1 is in good agreement with this value, 
exemplifying the fact that differences in the results below q m \ n 
are actually to be expected due to the different methods used. 
We can conclude that the RL and CS give almost identical 
results, which affirms the conceptual consistency of the two 
approaches. 

The BF, on the other hand, gives a somewhat higher inten- 
sity with slight superimposed oscillations which increase in 
amplitude, particularly at lower ^-values. Both characteristics 
were observed to be regular features of this method. 

According to equation (16), the scattering is composed of 
two additive contributions: one due to the explicit part of the 
system and the other due to the averaged surroundings. To 
illustrate the high importance of the latter, both contributions 
are plotted separately in figure 2, together with the resulting 
scattering curve. One can see that the finite-size effects are 
drastic — the maximum value of the first term is about three 
orders of magnitude larger than the peak value of the final 
scattering curve. For finite-size simulated systems, the Debye 
equation is clearly not usable without an additional correcting 
term. 

Another set of results, this time focusing on the method 
based on Fourier transforms of the pair distribution functions, 
is shown in figure 3 together with the results of the CS. For 
each FT scattering curve, a different bin width was used when 
constructing the pair distribution functions, in order to assess 
its effects on the final result. Excepting the case of the largest 
bin width, the curves are very similar and a close up is shown 
in figure 4 to reveal the differences. As expected, the results of 
the FT method approach the results of the CS as the bin width 
decreases (these two methods theoretically become identical 
in the limit when bin width approaches zero). However, it 
is interesting to note that this asymptotic approach does not 
proceed smoothly from one side; instead the results of the FT 
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FIG. 3. The scattering curves of pentanal calculated via the comple- 
mented system approach (solid line) and via Fourier-transformation of 
the pair distribution functions. Several series of pair distribution func- 
tions were generated, each with a different binning width: 0.5 A (long 
dashed), 0.2 A (short dashed), 0. 1 A (dotted) and 0.05 A (dot-dashed). 
See also figure 4 for a close up. 
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FIG. 4. Close up of figure 3. Bin widths used: 0.5 A (long dashed), 
0.2 A (short dashed), 0.1 A (dotted) and 0.05 A (dot-dashed). The 
result of the complemented system approach is shown as a solid line. 



fluctuate around the limiting value with an ever decreasing 
amplitude. 

This observation reveals a plausible use for the CS even in 
cases where many similar calculations must be made quickly, 
which usually makes the FT the preferred method due to its 
speed. The CS can be viewed as an optimal case of the FT (bin 
width infinitely small) and its results can therefore be used to 
gauge the effect of bin width in order to select an acceptable 
value in a particular context. When speed is not the limiting 
factor, the CS can be used directly to get an optimal result 
immediately. 

Indeed, the computational efficiency is usually an important 
consideration in selection of a method. To provide a compari- 
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FIG. 5. Computational efficiency of the methods. The diagram shows 
the time required to process a configuration with a certain number of 
particles. Symbols denote: RL (squares), BF (triangles), FT (empty 
circles), CS (filled circles). 

son between the efficiencies of the methods used, a series of 
test runs was conducted solely for that purpose. Even though 
the actual run times of the programs might not be directly 
applicable in general due to their dependence on the implemen- 
tation, the compiler/interpreter and the underlying hardware, 
we feel that on a relative scale, the results are telling enough. 

The relation between the number of particles in a configura- 
tion, N, and the time needed to process it is shown in figure 5. 
An immediately apparent fact is that the time complexity of 
the RL method is 0(N) in contrast to the other three methods 
which share a 0(N 2 ) time complexity. Thus, for a large enough 
number of particles, the RL method will always be the most 
efficient — at a price of having fixed g-values and exhibiting 
somewhat larger statistical uncertainties in the resulting curves. 
Amongst the other methods, FT is the fastest, followed by CS 
(approximately four times slower) and BF (approximately 200 
times slower than FT). 

It should be stressed that the majority of the time needed 
in an FT calculation is spent gathering the pair distribution 
functions; the calculation of scattering actually takes a negligi- 
ble amount of time which is also independent of N. Also, the 
binning width and the density of g-points do not affect the run 
times appreciably; they do affect the memory usage, though. 

V. CONCLUSION 

In this paper, we presented a novel method for calculating 
the X-ray scattering of a (simulated) model system. It operates 
directly on particle configurations and thus avoids the need 
to calculate the pair distribution functions: in this way, all 
the information contained in the configuration is preserved. 
Correlations over distances larger than some cut-off distance 
(in the usual case of a computer simulation, half of the cell's 
side length) are handled in a best-effort way by setting the 
pair distribution functions to their theoretical limiting value: 
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such an approximation effectively "masks" the finiteness of 
the system by complementing the missing surroundings of 
each particle with an average image of the system. We showed 
that this approach can in fact be regarded as a special case of 
the widely used Fourier transform method, giving directly the 
results that the latter method would produce in the limit of 
an infinitely small bin width. Numerical tests showed that the 
results also compare favourably to the results of other known 
methods. 
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